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Abstract .  The  concepts  and  results  of  [l]  are  discussed  in  the  light  of 
a  new  result  on  the  optimality  of  the  minimum  residual  algorithm.  The 
proof  is  given  in  Section  3.  The  terminology  of  [l]  is  described  in 
Section  4. 
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1.  Summary 

The  conjugate  gradient  algorithm  (called  CG  hereafter)  is  a  popular 
way  to  solve  large  sparse  positive  definite  systems  of  equations.  The 
minimum  residual  algorithm  (called  MR)  is  closely  related  to  CG  and  can 
he  applied  to  any  nonsingular  system,  -see  £?.]  ,  [6],  In  this  summary  we  describe'1 
our  result  and,  of  more  importance,  comment  on  their  significance.  The 
discussion  is  confined  to  exact  arithmetic  because  this  enquiry  concerns 
only  the  theory  of  MR  ,  CG  and  related  algorithms ,  not  their 
implementation^ 

Both  CG  and  MR  require  between  1  and  n  steps  to  solve  a  given 

n  by  n  nonsingular  system  Ax  =  b  ,  and  both  extremes  can  occur.  One 

of  the  attractions  of  these  methods  is  that  they  are  finite  and  only  use 

A  to  form  products  Av  .  Moreover  it  is  known  that  both  CG  and  MR 

are  optimal  in  a  certain  sense.  More  precisely,  at  each  step  each  method 

produces  a  vector  z  which  minimizes  the  norm  of  the  residual  vector 

b  -  Az  over  all  z  in  an  appropriate  subspace  of  En  .  As  usual  En 

denotes  the  space  of  all  real  n-dimensional  column  vectors.  The  methods 

differ  only  in  their  choice  of  norm,  MR  uses  the  Euclidean  norm  II vB 

-1/2 

whereas  CG  uses  an  "energy”  norm  lA  vll  .  The  former  is  independent 
of  A  but  the  latter  has  meaning  only  for  symmetric  positive  definite 
matrices  A  .  For  the  sake  of  simplicity  and  generality  we  shall  concentrate 
on  MR  in  the  rest  of  this  essay. 

Most  numerical  analysts  suppose  that  the  general  theory  of  MR  stops 
at  this  point.  Indeed,  to  estimate  how  many  steps  will  be  required  to 


Page  2 


achieve  a  prescribed  reduction  in  the  residual  norm  some  additional 
information  about  A  ,  such  as  its  condition  number  or  seme  other 
measure  of  its  eigenvalue  distribution,  must  be  supplied. 

Recently,  in  [l],  Traub  and  Wozniakowski  have  sought  to  enlarge  the 
scope  of  the  discussion  of  MR  .  Following  the  lines  laid  down  in  [2] 
they  remove  the  restriction  of  algorithms  to  the  standard  (polynomial)  class 
over  which  MR  and  CG  are  optimal.  Why,  they  say,  restrict  attention 
to  the  standard  (Krylov^")  subspace  of  Rn  ? 

Will  MR  remain  optimal  when  any  rival  algorithm  is  allowed?  They 
address  this  question  in  [l]  while  erecting  a  new  framework  around  the 
study, in  general,  of  iterative  algorithms  for  solving  Ax  =  b  .  They  succeed 
in  showing  that  MR  costs  at  most  one  iteration-step  more  than  an  optimum 
algorithm  could  require. 

Actually  MR  is  still  optimal  for  positive  definite  A  ;  this  note 
will  explain  why,  and  then  go  on  to  indicate  how  all  such  results,  theirs 
and  ours,  can  mislead  the  unwary  reader. 


2.  Discussion  of  Ref,  [l] 

To  go  any  further  we  must  introduce  some  terminology.  At  step  j 
MR  computes  a  linear  combination  x,  of  the  vectors 

w 

b  ,  Ab  ,  A^b,  ...,  A^_1b  .  No  generality  is  lost  by  the  assumption  (in 
force  throughout  the  paper)  that  this  set  of  vectors  is  linearly  independent 


Krylov  subspaces  are  defined  in  Section  2. 
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Another  vector  A^b  ,  or  something  equivalent  to  it ,  is  needed  to  determine 


the  coefficients  for  MR's  output  x  although  AJb  itself  is  not  part 

J 


of  the  linear  combination.  For  future  reference  we  let  v  denote  x's 

J  J 


residual  norm.  The  defining  property  of  MR  and  an  expression  for  x 


i 


are  given  in  the  following  equations. 


Vj  s  minlb  -  YQAb  -  Y^b  -  ...  -  Y^^bB  , 

=  minlb  -  A(YQb  +  Y^b  +  ...  +  YJ_1AJ_Lb)|  , 

"^i 

=  lb  -  AXj  U  . 

The  theory  in  [l]  can  be  motivated  by  asking  whether  there  exists  some 
clever,  well-hidden  algorithm  which,  with  no  further  multiplication  of 
vectors  by  A  ,  delivers  a  vector  z  with  lb  -  Azll  <  and  so  surpasses 

MR  .  The  answer,  after  a  little  thought, is  yes.  The  "well  hidden" 
algorithm  is  Gaussian  elimination  (with  pivoting)  which  delivers  A  Hi 
itself  without  computing  Av  for  any  v  I  Of  course  we  feel  cheated  and 

this  reaction  shows  that  the  question  must  be  posed  with  more  care. 

Rival  algorithms  must  be  prevented  fran  getting  at  A  through  the  back 
door. 

One  way  to  be  fair  is  to  limit  knowledge  of  A  to  the  current 
information:  b  ,  Ab,  ...,  A^b  ,  or  some  "equivalent"  set  of  vectors. 

In  general  these  vectors  do  not  fix  A  uniquely  and  so  there  is  the  set 


AJ  =  (A:  A^b  =  Aib 


i  =  1, 


J} 


of  matrices  indistinguishable  from  A  with  the  given  information.  Sometimes 
we  restrict  A1^  to  a  particular  class  of  matrices  A  ,  sometimes  not. 

The  context  decides. 

Traub  and  Wozniakowski  use  in  specifying  the  cost  of  an  algorithm 
for  a  given  matrix.  The  technical  description  is  given  in  Section  U  (along 
with  a  warning)  but,  in  English,  the  cost  is  the  minimal  number  of  steps  j 
required  to  achieve  a  given  reduction  in  the  residual  norm  for*  any  A  in 
A^  ,  i.e.  the  number  of  steps  needed  in  the  worst  case.  This  cost,  for 
an  algorithm  <j>  ,  is  denoted  by  k(<j>.  A)  .  The  new  version  of  optimality 
on  which  [l]  is  based  is  optimality  over  a  whole  class  F  of  matrices. 

The  authors  seek  an  algorithm  $  ,  which  satisfies 

k($.  A)  =  min  k((j>.  A)  for  each  A  in  F  . 

* 

The  minimum  is  over  all  algorithms  using  the  given  information. 

These  are  the  basic  concepts.  There  are  many  theorems  in  [l].  The  main 
result,  as  mentioned  above,  is  that  over  the  class  F  «  SPD  ,  the  class 
of  symmetric  positive  definite  matrices,  and  also  over  larger  classes  F  , 

MR  is  within  one  matrix-vector  multiplication  of  being  an  optimal  algorithm; 
but  [l]  does  not  exhibit  an  algorithm  $  that  is  optimal  for  any  of 
those  classes  F  .  Instead  they  study  restricted  classes  and  finish 
[l]  with  conjectures  and  open  problems,  as  if  the  whole  field  were  ripe 
for  further  development. 

We  doubt  the  ripeness.  The  questions  being  asked  seem  to  be  the 
wrong  questions.  For  a  start  the  quest  for  many  optimal  $  is  not  necessary 


because 
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MR  is_  optimal  for  the  class  SPD  , 
and  for  any  larger  class. 


The  proof  is  given  in  Section  3.  It  amounts  to  showing  that  for  each 


rival  vector  y  ^  x,  one  can  construct  a  positive  definite  S  =  A(y) 


in 


such  that 


!lh  -  Ay  I  > 


Thus  y  is  worse  than  for  A  and  for  many  other  matrices  in  A^  . 


Even  worse,  for  y  not  in  the  usual  subspaces,  {lib  -  Syll  :  A  6A^  fl  SPD} 
is  unbounded. 

The  devastating  consequence  of  this  observation  for  the  theory 
launched  by  Traub  and  Wozniakowski  is  that  if  <j>  produces  any  approximation 
unrestricted  to  combinations  of  b  ,  Ab,  ...»  A^  ^b  then  for  A's  in 
SPD  the  cost  k(<j>.  A)  >  n  . 

In  other  words  MR  wins  because  all  its  new  rivals  die  at  the  starting  gate 
The  villain  of  the  piece  is  the  class  SPD  .  It  is  so  big  that  the  generality, 
espoused  in  [l],  of  allowing  any  algorithm  4>  (and  hence  approximations 
outside  the  usual  subspace)  is  annulled  by  the  required  fairness  of  taking 
the  worst  case  in  A^  .  The  theory  becomes  vacuous. 

On  the  other  hand  SPD  is  the  most  important  class  of  matrices  in  the 
application  of  iterative  methods. 

One  way  to  try  to  save  the  theory  is  to  restrict  it  to  small  subsets 
of  SPD  ,  for  example  to  those  matrices  with  condition  number  <  100  .  Yet 
even  here  k(<j).  A)  =  n  unless  <p  is  very  close  to  MR  ,  as  shown  in  Section  3. 
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In  those  cases  when  k(<j>,  A)  <  n  the  theory  in  [l],  though  not  vacuous, 
reduces  to  a  translation  into  its  terminology  of  standard  bounds,  see  [1*] 
for  example,  on  convergence  rates  of  MR  .  How  sad  it  is  to  evade  the 
Charybdis  of  vacuity  only  to  founder  on  the  Scylla  of  redundancy.^" 

What  has  gone  wrong?  Why  has  such  an  enticing  research  program,  on 
closer  examination,  dissolved  away  like  the  Cheshire  Cat  in  "Alice  Through 
the  Looking  Glass"?’  We  are  not  sure  but  we  offer  the  following  thought. 

The  information  {b,  Ab,  A  b,  . ..,  A^b}  seems  to  arise  as  an  artifact 
of  the  MR  and  CG  algorithms ,  not  as  part  of  the  problem  of  solving 
Ax  =  b  by  some  iterative  method.  The  optimality  of  MR  over  SPD  may 
be  true  but  it  is  no  more  exciting  than  the  discovery  that  the  dress  suit 
made  for  Mr.  X  fits  him  better  than  it  does  anyone  else. 

For  readers  of  this  note  who  are  not  familiar  with  [l]  we  have  added 
a  final  section  in  which  its  leading  notions  are  stated,  together  with 
our  comments.  This  will  make  it  clear  that  this  paper  and  [l]  are  concerned 
with  exactly  the  same  problem,  namely  the  analytic  complexity  of  methods 
for  solving  Ax  =  b  which  are  restricted  to  the  use  of  Krylov  information 
on  A  .  Whereas  [l]  claims  that  this  is  a  new  topic  deserving  of 
development  we  suggest  that  the  nonvacuous  aspects  are  well  known,  although 
they  are  usually  expressed  in  more  mundane  language  than  is  favored  in  [l]. 

Here,  in  one  sentence,  is  our  version  of  the  analytic  complexity  of 
the  problem.  Over  the  standard  matrix  classes  (SPD,  Sym,  nonsingular), 
among  all  algorithms  using  the  given  information  MR  is  (worst  case) 
optimal  and  its  canplexity  (number  of  steps  required)  is  n  ,  which  is 
as  bad  as  it  could  be. 

"^From  the  Greek  myths.  Boats  in  the  Strait  of  Messina  had  to  steer 
a  very  fine  line  between  two  monsters  Charybdis  (a  whirlpool)  and  Scylla 
( a  rock ) . 
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Unless  worst  case  analyses  yield  something  better  than  worst  possible 
outcomes  our  interest  shifts  from  them  to  the  reasons  why  particular  cases 
fare  so  well.  Except  for  special  right  hand  sides  b  the  residual 
reduction  produced  by  ME  depends  entirely  on  the  usually  unknown 
distribution  of  eigenvalues.  Significant  diminutions  occur  after  a  small 
number  of  steps  when  the  eigenvalues  are  grouped  into  a  small  number  of 
.clusters  —  even  when  the  condition  number  is  large.  It  is  specific 
surprising  results  such  as  this  which  give  substance  to  the  study  of  MR 
and  CG  . 


3.  Optimality  of  MR 

A  little  preparation  is  necessary  before  we  can  state  and  prove 
the  theorem. 

The  only  information  available  concerning  the  nonsingular  n  by  n 

2  1 

matrix  A  is  the  set  of  vectors  {b  ,  Ab  ,  A  b,  ...,  AJb}  ,  1  < j  <  n  . 

Actually  the  methods  use  "equivalent"  combinations  of  these  vectors 
which  yield  alternative,  and  more  practical  bases  for  the  subspace 

KJ+1  =  span(b  ,  Ab,  ...,  AJb}  ,  1  <J  <  n  . 

These  subspaces  of  ]Rn  are  sometimes  called  Krylov  subspaces. 

In  theory  it  is  possible  to  have  A ^  dependent  on  b  ,  Ab,  ...»  Am  ^ 
for  some  m  <  n  .  In  this  case  it  is  easily  verified  that  Km  is  invariant 
.  Moreover  the  exact  solution  A~^b  lies  in  Km  . 


under  A 


Consequently 
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there  is  no  need  to  look  outside  Km  for  an  approximation.  Theoretically 
A  can  be  replaced  by  its  restriction  to  Km  and  the  change  will  not  be 
noticed.  So,  without  loss  of  generality,  we  may  invoke 

Assumption  1.  {b,  Ab,  An  ^b}  is  linearly  independent. 

Now  the  superscripts  on  the  indicate  dimension. 

A  useful  basis  for  Kn  (  =  En  )  is  the  Lanczos  basis 
{^1  ,  q.g>  •••»  <1^  obtained  by  applying  the  Gram-Schmidt  orthonormalizing 
process  to  the  original  ordered  basis  [b  ,  Ab,  ...,  An-1b}  .  If  we 
introduce  the  n  by  j  matrix  =  (q^  ,  ...,  )  we  have 

T 

QjQj  =  Ij  ,  the  j  by  j  identity  matrix,  j  =  1,  ...,  n  .  (2-1) 


By  representing  A  in  this  basis  our  proof  beccxies  transparent. 

The  actual  details  of  the  MR  algorithm  are  not  important  here.  Its 
output  Xj  ,  at  step  j  ,  depends  on  A^b  but  x^  itself  lies  in  K ^  . 
By  definition  of  MR  x^  satisfies 


V  —  min  !lb  -  Avll  =  lib  -  Ax, 

u  a  J 

vEKJ 


(2-2) 


Note  that  the  information  available  determines  the  action  of  A  on  K ^ 
but  not  on  since  A*A^b  is  unknown. 

The  set  of  matrices  indistinguishable  from  A  at  step  j  is 


=  {A:  A^b  =  Axb 


i  =  1 


9  •  •  •  9 


j}  ,  J  <  n 
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In  other  words,  at  this  step  K^+^~  is  uniquely  determined  but 
is  not  (unless  J  *  n  -  1  )  . 

It  will  be  convenient  to  abbreviate  by  GL  ,  Sym,  and  SPD  the 
classes  of  nonsingular,  symmetric,  and  symmetric  positive  definite  matrices. 

The  reason  for  introducing  the  Lanczos  basis  is  that  with  respect  to 
it  any  A  €  Sym  is  represented  by  a  tridiagonal  matrix  Tr  .  We  define 


V 


“i 


e2  “a 


BJ  “j 


(2-3) 


The  preparation  is  over,  tie  now  proceed  to  the  statment  and  proof  of 
the  theorem. 

The  optimality  of  MR  for  a  single  matrix  A  has  been  stated  above 
in  (2-2),  but  MR  is  optimal  in  a  somewhat  broader  sense.  With  repect 
to  AJ  n 
of  the 


SPD  there  is  no  vector  in  E 1  which  is  better  than  x^  because 


THEOREM.  min  sup  lib  -  Ay II  =  Vj  . 
y  SEn  A€AJOSPD 

We  prove  the  theorem  by  considering  a  special  one-parameter  family 
of  matrices  A^  and  establishing  two  lemmas.  Using  the  orthonormal 
basis  consisting  of  the  columns  of  Qn  we  have 


A 
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T 

Q„AQn  = 
n  n 


T .  1  0 

i  I 


0  1  U 


(2-U) 


where  •  =  $  and  is  known,  whereas  U  stands  for  the  unknown  part  of 

T  .  Different  A  €  ft  SPD  will  have  different  values  for  U  .  We  define 


implicitly  by 


Q  A  Q  = 
nan 


0  ‘al 


n  -2 


(2-5) 


where  *  *  ^j+1  * 


LEMMA  1.  For  all  large  enough  a  ,  Afl  6  H  SPD 


Proof.  Recall  that  Qn  is  orthogonal  so  that  Aa  €=  Sym  ,  by 

construction.  Moreover  A^  6  by  comparison  of  (2-U)  and  (2-5).  More 

til 

formally,  writing  e.^  for  the  i  column  of  I  ,  we  have 


Vi  *  °-nTjei  *  6i/j*lVl  "  A,i  •  1  <  J  • 


til 

The  Kronecker  symbol  6^  is  the  i  element  of  e^  .  Thus 


A/1  -  AK1  ,  i  <  J  , 


and  so  Aa  and  A  produce  the  same  K  for  i  *  1,  +  1 
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It  remains  to  see  vhen  A  ®  SPD  .  Note  that  by  doing  a  block 

a 


triangular  factorization 


T 

3  A  Q 
n  an 


— 

0 

T 

c 

I  , 

0 

n-J 

rj  l 


2  T 

0  1  al  -gf  n.ee1 
1  n-j  j+1  ]  1  1 


I  °  ° 


0  1  I 


n-j 


T  T  -1  T  -1 

where  c  =  8.., e.T.  and  n.  =  e.T  e,  .  In  other  words  A  is 

J+J-dJ  jJdd  u 

congruent  to  the  direct  sum 


®  diag{a  -  ,  a,  ...»  a)  . 


Observe  that 


o  <  n,  <  Kt"1!!  =  iA  4„(t,)  < 


'3  3 


min  j 


min 


So  A^  ®  SPD  when  a  >  8j+1Amin(A)  .  □ 


LEMMA  2.  If  yf  then  lib  -  A^yll  -*■  »  as  a  -*■  »  .  For  each 

y  €  En  there  is  an  a  such  that  lib  -  A  yll  >v.  for  a  >a  ,  ami 

a  j 

equality  holds  if  an  only  if  y  *  . 


Proof.  Represent  y  in  the  Lanczcs  basis  as 


y 
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where  s  6^  ,  t  €  En  ^  .  Then,  with  b  =  q^B^  , 


—  1 

\ 

1 

Vl8l  '  Sn 

• 

s 

•1 

01  < 

|  n-j 

t 

[I 

• 

ei6i  -  V  -  V 


-  e^a  -  at 


,  since  is  orthogonal. 


T  T 

where  t  =  e^t  ,  a  =  e^s  . 
If  then  t  #  0  and 


lib  -  A  yll  ^  a Ht  +  e,  a/all  , 
a  i 

-*■<»,  as  a  -*•  °° 


1  2 

More  precisely  for  any  y  ^  KJ  ,  8b  -  A^yi  is  a  quadratic  in  a  with 

2  2 

leading  coefficient  lit  II  and  so  exceeds  Vj  for  all  a  exceeding  some 

a  depending  an  t  and  s  .  When  t  =  0  then  -Ay  ,  we  can  take 

a  =  0  ,  and  standard  least  squares  theory  states  that  the  minimum  residual 

norm  v  is  attained  when  b  -  Ay  1  .  This  occurs  only  when 

J 

s  =  .  In  that  case 


y  ■  •  V  8,14  vj '  leJTjlei|8i  •  D 
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Proof  of  theorem. 

If  y  ^  then  by  Lemmas  1  and  2, 

sup  ^  -  Ay II  >  supllb  -  A^yll  *  <»  . 

aVispd  a 

If  y  €  then 

sup  lb  -  Syll  =  lib  -  Ay  II  lib  -  Ax  B  *  v  .  □ 

\  J  J 

AJnSPD 

We  remark  that  we  have  actually  proved  that  MB  is  optimal  over  any 
class  larger  than  SPD  ,  in  particular  over  Sym  and  GL  . 

Let  us  consider  briefly  what  happens  when  a  is  not  permitted  to  go 
to  00  .  Using  the  expressions  in  the  proof  of  Lemma  2  we  write 
u  *  “  Tjs  and  find  that 


lib  -  A^yll2  =  Hull2  -  2yx  +  x2  +  a2  +  2acrr  +  a2  lit  II2 


where  y 
a  <  1 A I 
MR  ,  we 
cases  we 


T 

=  ejU 


»  t  =  e^t  , 


T 

a  *  e  s 
J 


It  would  be  unreasonable  to  demand 
and  so  we  see  that,  unless  it  II  is  tiny,  i.e.,  4>  is  close  to 
can  have  lb  -  A^yfl  >  0A II  •  Ht  H  >  the  required  accuracy  .  In  such 
still  have  k($,  A)  =  n  and  the  theory  in  [l]  becomes  vacuous. 


U.  The  Complexity  Connection 

We  describe  the  principal  terms  in  [l]. 

The  task  is  to  find  x  in  Kn  so  that,  for  given  A  and  b  , 
lb  -  Axi  <  db!  ,  for  some  given  fixed  e  .  Thus  e  is  a  parameter  in 
the  theory.  The  information  {b,  Ab,  . ..,  A^b}  is  denoted  by  N.(A,  b)  . 
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The  class  from  which  A  is  drawn  is  F  .  Our  set 

Ai  =  {T:  r  e  f  ,  as  =  aS  ,  i  =  i,  . . . ,  j > 


is  denoted  by  V(N^(A,  b))  ,  Recall,  from  [3]  or  [U],  that 

MR  is  optimal  among  those  algorithms  which  use  and  deliver  output 

in  the  Krylov  subspace  .  The  springboard  for  [l]  is  the  removal  of 

this  restriction  on  rival  algorithms.  We  shall  say  that  an  algorithm 

is  new  if  its  output  using  N  is  not  in  .  Now  we  are  ready  to 

0 

consider  the  main  concepts  in  [l]. 

For  any  algorithm  <f>  ,  whose  output  fran  is  zj  »  tl]  defines 


k(<J>,  A)  s  min{k:  Kb  -  Az  1  <  ellbl  ,  V  X  6  AK} 


as  the  matrix  index.  It  purports  to  be  the  cost  of  <j>  as  measured  by 

the  number  of  matrix-vector  products  needed  in  the  worst  case.  Unfortunately 

an  enlargement  in  F  can  increase  k(4>,  A)  thus  robbing  k(<J>,  A)  of 

that  meaning.  As  shown  in  Section  2  we  may  assume  that  the  set  Nj  is 

linearly  independent  for  all  j  <  n  .  When  j  =  n  the  set  must  be 

dependent  (it  has  n  +  1  vectors)  and  MR  delivers  the  exact  solution. 

Our  theorem  in  Section  2  shows  that  if  F  3  SPD  and  4>  is  a  new  algorithm 

then  sup  Jb  -  Sz.  H  =  »  ,  for  all  k  <  n  .  Hence  k($,  A)  >  n  ! 

^  * 

A£A* 

For  old  algorithms  k('J),  A)  >k(MR,  A)  .  The  value  of  e  is  irrelevant. 
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Next  comes  the  class  index  of  <p  , 

k(<}),  F)  =  sup  k(<}>,  A)  . 

A& 

For  new  algorithms  k(<j>,  F)  ^n  for  F  D  SPD  and  for  many  smaller  classes 
Even  for  MR  ,  k(MR,  SPD)  =  n  .  Since  k(<j>,  A)  itself  depends  on  F  the 
distinction  between  k(<j>.  A)  and  k(<J>,  F)  is  obscure. 

Traub  and  Wozniakowski  seek  to  minimize  these  indices.  They  define 
the  optimal  matrix  index  by 

k(A)  «  min  k(<j>.  A) 

<(> 

and  the  optimal  class  index  by 

k(F)  =  max  k(A)  . 

AEF 

By  the  remarks  made  above,  if  P  D  SPD  then 

min  k(4> ,  A)  >  n 
new  <{> 


while 


min  k(^.  A)  =  k(MR,  A)  . 
old  ip 


An  algorithm  $  is  called  strongly  optimal  (over  F  )  if 


k($.  A)  ■  k(A)  ,  VA  e  F 
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and  is  called  optimal  (over  F  )  if 

k($,  F)  =  k(F)  . 

We  have  just  seen  that  MR  is  strongly  optimal  over  any  F  3  SPD  . 

It  should  he  mentioned  that  [l]  does  not  exhibit  this  result.  Instead 
a  nice  proof  shovs  that  if  F  is  orthogonally  invariant  then 

k(MR,  A)  -  k( A)  <  1  ,  V  A  e  F  . 

In  addition  a  very  contrived  2  by  2  example  is  given  to  show  that 
equality  can  occur  for  a  peculiar  F  . 

We  want  to  make  a  comment  on  the  hypothesis  that  F  be  orthogonally 
invariant.  One  of  the  attractive  features  of  Krylov  information  N,(A,  b) 
is  that  it  is  coordinate  free.  The  standard  methods  and  theory  are 
geometric  in  nature.  Any  change  of  basis  which  preserves  the  Euclidean 
norm  is  permitted.  It  seems  unnatural  to  introduce  matrix  classes  F 
which  do  not  share  this  property  with  the  information.  It  is  hardly 
surprising  that  the  study  made  in  [l]  of  tridiagonal  matrices  shows 
that  "anything  can  happen." 

By  studying  subclasses  of  SPD  whose  members  have  condition  numbers 
bounded  by  small  enough  <  it  is  possible  to  obtain  A)  <  n  ,  VA  E  F 

and  sometimes  k(A)  <  n  .  Of  course  k(MR,  A)  will  then  be  a  complicated 
function  of  e  and  <  .  This  topic  has  been  studied  by  numerical 
analysits.  They  showed  how  cond(A)  can  be  used  to  bound  k(MR,  A)  . 

With  more  information  on  the  spectrum  more  accurate  bounds  can  be  derived. 


Unfortunately  the  theorems  in  Section  5  on  symmetric  matrices  contain  no 
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new  insights  hut  are  translations  into  the  language  of  complexity  of  the 
known  bounds  mentioned  above.  In  contrast  the  results  on  unsymmetric  matrices 
are  new  and  quite  misleading.  The  authors  prove  in  a  dead  pan  manner  that 
even  when  cond(A)  =1  it  is  possible  to  have  k(MR,  A)  =  n  .  Not  a  word 
is  said  about  the  fact  that  for  unsymmetric  matrices  cond(A)  is  no 
indicator  of  the  clustering  of  A's  spectrum  and  so  is  virtually 
irrelevant  to  the  convergence  rate  of  MR  .  Cond(A)  measures  the  spread 
of  singular  values,  not  the  eigenvalues. 

The  theorems  in  [l]  tell  us  more  about  Analytic  Complexity  theory 
than  about  iterative  methods  for  solving  Ax  =  b  . 
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